Skip to content

Scope fastokens adaptation to individual tokenizers#91

Open
xeophon wants to merge 1 commit into
mainfrom
codex/per-tokenizer-fastokens
Open

Scope fastokens adaptation to individual tokenizers#91
xeophon wants to merge 1 commit into
mainfrom
codex/per-tokenizer-fastokens

Conversation

@xeophon

@xeophon xeophon commented Jun 21, 2026

Copy link
Copy Markdown
Member

Overview

Load tokenizers through vanilla Transformers first, then replace only the returned tokenizer's backend with the same fastokens compatibility shim used by the global patch path.

This removes the process-wide patch window from renderers.load_tokenizer while preserving the fastokens-backed tokenizer returned to renderer callers.

Why

The previous path called fastokens.patch_transformers() around the complete AutoTokenizer.from_pretrained operation. Tokenizer loading can take hundreds of milliseconds or longer, and the patch mutates Transformers classes and decoder state process-wide for that entire interval. Unrelated tokenizer loads running concurrently could therefore receive fastokens unexpectedly, including its lack of offset mappings.

Per-instance adaptation keeps Transformers globals vanilla throughout the load. Concurrent renderer-pool slots still receive independent fastokens backends, while environment, harness, and application tokenizer calls remain unaffected.

Behavior

  • Fastokens encode/decode parity and the existing incompatible-model denylist remain unchanged.
  • use_fastokens=False, trusted revision pins, tokenizer mirrors, and renderer auto-resolution retain their existing behavior.
  • Unsupported fastokens backends return the already-loaded vanilla tokenizer instead of loading it a second time.
  • Offset-aware tokenizers now load directly through vanilla Transformers; the global-patch race workaround is no longer needed.
  • Fast-path announcement remains once per process under concurrent pool construction.

This narrower tokenizer ownership contract allows async consumers such as Verifiers to initialize renderer pools in worker threads without exposing temporary Transformers mutations to other coroutines.


Note

Medium Risk
Touches the default tokenizer load path used by renderer pools (performance and thread-safety contract), but behavior is intentionally preserved with added concurrency coverage.

Overview
Replaces process-wide fastokens.patch_transformers() with per-instance backend adaptation in load_tokenizer: tokenizers load through vanilla AutoTokenizer first, then only the returned object's _tokenizer is wrapped with fastokens._compat._TokenizerShim.

This removes the global patch window (and related stdout suppression / patch lock) that could affect concurrent AutoTokenizer.from_pretrained calls during slow loads. Encode/decode parity, FASTOKENS_INCOMPATIBLE, use_fastokens=False, and trusted-revision behavior stay the same; adaptation failures now keep the already-loaded vanilla tokenizer instead of reloading.

Offset-tokenizer loading in _get_offset_tokenizer drops the patch/unpatch race workaround and always uses vanilla Transformers, since fastokens no longer mutates globals.

Tests add a concurrent “slow load” case and refresh wording from “patch” to “adaptation.”

Reviewed by Cursor Bugbot for commit 7a670c5. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Scope fastokens adaptation to individual tokenizers instead of globally patching transformers

  • Replaces global fastokens.patch_transformers() / unpatch calls with per-tokenizer backend swapping via a new _adapt_tokenizer_with_fastokens function in renderers/base.py.
  • load_tokenizer now loads via vanilla AutoTokenizer first, then swaps only the returned tokenizer's _tokenizer backend to the fastokens _TokenizerShim; other AutoTokenizer calls elsewhere in the process are unaffected.
  • _get_offset_tokenizer no longer toggles any fastokens global state; it loads a vanilla fast tokenizer directly.
  • A one-time INFO log per process (guarded by _FASTOKENS_ANNOUNCE_LOCK) replaces the previous per-load stdout redirect.
  • Behavioral Change: concurrent AutoTokenizer.from_pretrained calls made while load_tokenizer is in progress now always get a vanilla tokenizer, eliminating the previous race window introduced by global patching.

Macroscope summarized 7a670c5.

@macroscopeapp

macroscopeapp Bot commented Jun 21, 2026

Copy link
Copy Markdown

Approvability

Verdict: Needs human review

This PR refactors the fastokens integration from process-wide patching to per-tokenizer adaptation, changing how core tokenizer loading infrastructure operates. While the change simplifies code and improves isolation, modifications to how this optimization is applied warrant human review.

You can customize Macroscope's approvability policy. Learn more.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7a670c50a4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread renderers/base.py
# restore the prior patch state. Never cache a non-offset tokenizer.
# This path deliberately loads through vanilla Transformers because
# fastokens adaptation is scoped to ``load_tokenizer``'s returned object.
offset_tok = _load_tokenizer_via_auto(load_name_or_path, **kwargs)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore unpatched offset reloads under global fastokens

When the host process has already called fastokens.patch_transformers() (for example because another serving stack enables it globally), this reload is not vanilla: _load_tokenizer_via_auto() returns another fastokens shim, _has_offsets() stays false, and hand-coded renderers that call attribute_text_segments() raise instead of using the offset cache. The deleted fallback used to temporarily unpatch in exactly this case, so this path still needs to force the offset tokenizer reload through unpatched Transformers.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant